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ABSTRACT 


This  report  documents  a continuing  Ivestlgatlon  of 
two  recent  programs  for  the  solution  of  AX  = B where  A 
is  symmetric  and  sparse:  the  Yale  Sparse  Matrix  Package 
and  the  Munksgaard  subroutines.  These  two  programs  com- 
pute direct  solutions  In  core  using  triangular 
decomposition  and  Gaussian  elimination,  respectively. 
Their  performance  Is  compared  with  that  of  an  out-of-core 
Cholesky  algorithm  equation  solver,  CSKYDG2. 


As  would  be  expected,  the  two  In-core  equation 
solvers  are  much  faster  than  CSKYDGZ.  The  Yale  symmetric 
subroutines  range  up  to  six  times  faster  than  the 
Munksgaard  subroutines.  All  three  equation  solvers 
provide  the  same  degree  of  accuracy.  However,  the  two 
In-core  equation  solvers  require  such  enormous  amounts  of 
core  storage  that  their  use  Is  not  recommended  on  the 
CDC  6000  series  of  computers  In  their  present  form. 

While  CSKYDG2  requires  less  core  storage.  Its  runs  on  the 
CDC  6400  cost  more  because  of  the  repeated  use  of  the 
random  access  storage  capability. 


INTRODUCTION 


One  of  the  long  range  projects  of  the  Computation,  Mathematics,  and 
Logistics  Department  has  been  the  development  of  mathematical  subroutines 
suitable  for  use  In  the  computer-aided  structural  analysis  of  ships. 

Many  unrelated  efforts  in  both  government  and  Industry  have  resulted  In 
computer  programs  that  treat  particular  classes  of  structural  problems. 
These  programs  often  Involve  the  solution  of  similar  mathematical 
problems  but,  since  the  solutions  are  reached  Independently,  the  efficiency 
and  accuracy  of  the  various  algorithms  used  may  vary  greatly.  The  need 
to  coordinate  these  diverse  efforts,  to  develop  Improved  methods  of  more 
general  applicability,  and  to  produce  more  comprehensive  programs  for 
solving  Navy  structural  problems  became  obvious.  A project  was  therefore 
established  to  coordinate  research  efforts  Involving  mathematical  and 
computational  methods  in  the  area  of  structural  mechanics  and  to  Integrate 
the  work  of  mathematicians,  computer  specialists,  and  structural  engineers 
In  this  field. 

The  present  considerable  Interest  In  the  finite  element  approach  to 
structural  analysis  Is  evidenced  by  the  widespread  use  of  NASTRAN  (NAsa 


STRuctural  ANalysls  program)  and  other  such  programs.  According  to  the 

1* 

NASTRAN  Theoretical  Manual,  "From  a theoretical  viewpoint,  the  formula- 
tion of  a static  structural  problem  for  solution  by  the  displacement 
method  is  completely  described  by  the  matrix  equation  KU=P."  Thus,  there 
is  a need  for  accurate  efficient  computer  subroutines  capable  of  solving 
these  large  sparse  positive  definite  systems  of  simultaneous  linear 
equations.  However,  the  order  of  K is  often  so  large  that,  even  when 
advantage  is  taken  of  K's  symmetry  and  banded  structure,  it  is  not  feasible 
and  sometimes  not  even  possible,  to  store  K in  the  core  memory  of  a 
computer. 

2 

This  report  continues  the  investigation  of  two  recently  developed 

programs  for  solving  KU=P,  where  K is  a matrix  of  very  large  order.  It 

3 

documents  the  testing  of  these  programs  together  with  a third  program 

in  order  to  compare  their  performances  in  solving  large  order  systems 

4 

taken  from  the  finite  element  approach  to  structural  analysis. 

THE  PROGRAMS 

The  following  programs  are  considered  in  this  report: 

• The  Munksgaard  collection  of  subroutines^  implements  Duff's 

algorithm  for  solving  sparse  symmetric  systems.  It  can  obtain  a stable 

decomposition  of  K in  both  the  definite  and  indefinite  cases.  This 

program  was  developed  at  the  Technical  University  of  Denmark  and  the 

Atomic  Energy  Research  Establishment  of  the  United  Kingdom. 

7 8 

• The  Yale  Sparse  Matrix  Package  ’ is  a collection  of  sub- 
routines which  can  be  used  to  solve  both  symmetric  and  non-symmetric 

7 8 

systems.  It  uses  algorithms  developed  by  Elsenstat,  et  al.  ’ Limita- 
tions are  placed  on  the  distribution  and  use  of  these  programs  by  the 
Department  of  Computer  Science,  Yale  University. 

• CSKYDG2  is  an  out-of-core  Cholesky  algorithm  equation  solver 

3 

developed  by  the  present  author.  It  makes  use  of  auxiliary  storage  via 
the  random  access  capabilities  of  the  CDC  6000  series  of  computers. 


*A  complete  listing  of  references  is  given  on  page  9. 
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THE  TEST  EXAMPLES 


The  table  consists  of  the  tabulated  solution  times  for  AX  » B where 

U* 

A Is  a matrix  constructed  from  the  connection  tables  of  Everstlne.  We 
construct  A In  the  following  manner.  Let  c be  some  positive  constant. 

The  off-dlagonal  non-zero  elements  of  A are  then  set  equal  to  -c.  If 
there  are  I such  off-dlagonal  elements  In  a given  row  of  A,  then  the 
diagonal  element  of  that  row  Is  set  equal  to  (il+l)c.  This  ensures  that  A 
Is  positive  definite.  Next  set  all  the  coordinates  of  B,  the  right-hand 
side  of  the  system,  equal  to  c.  Then  all  coordinates  of  the  solution 
vector  X are  equal  to  1. 


TABLE  NOTATION 

The  computation  described  In  this  report  was  done  In  the  spring  of 
1978  on  the  DTNSRDC  CDC  6400  with  844  disk  drives  using  the  FTN  4.6+433 
FORTRAN  compiler  and  the  NOSIBE  1.0  F+34  operating  system.  The  times, 
given  in  terms  of  CDC  6400  CPU  seconds,  were  obtained  using  the  timing 
subroutine  SECOND. 

The  column  headings  are  defined  as  follows: 

N - the  order  of  the  system 

M - the  system  bandwidth 

MATRIX  DENSITY  (%)  - the  ratio  of  the  number  of  non-zero  elements 
to  N^  multiplied  by  100 

INDANL  TIME  - the  time  required  for  the  INDANL  subroutine  of  the 
Munksgaard  program  to  permute  and  factor  the  coefficient 
matrix  A thus 

B - PAP^ 

B - (E+L)D(E+L^) 

where  P Is  a permutation  matrix  chosen  so  ^s  to  "satisfy  a 
stability  criterion  and  have  the  least  possible  'sparsity- 
cost'",^  D and  E are  symmetric  block  diagonal  matrices  whose 
blocks  are  of  order  1 or  2,  and  L Is  strictly  lower 
triangular. 

*N0TE:  The  connection  tables  of  Everstlne  are  available  locally  on 
a CDC  6400  permanent  file. 


i 

I 

INDOPR  TIME  - the  time  required  for  the  INDOPR  subroutine  of  the 
Munksgaard  program  to  solve  the  system 

(E+L)D(E+L^)P  Y = Pb 

by  forward  and  backward  substitutions  and  obtain  X from 

T 

X = P Y 

TOTAL  TIME  - the  sum  of  INDANL  TIME  and  INDOPR  TIME. 

ORDV  TIME  - the  time  required  for  the  ORDV  subroutine  from  the 
Yale  package  to  reorder  the  system  using  the  maximum  degree 
ordering  algorithm  with  threshold  search.^ 

SDRV  TIME  - the  time  required  for  the  SDRV  subroutine  from  the 
Yale  package  to  compute  the  LDL'^  decomposition  of  the 
permuted  matrix  A and  obtain  the  solution  X of  the  original 
system  AX  > B using  forward  and  backward  substitution  as  done 
previously. 

TOTAL  TIME  - the  sum  of  ORDV  TIME  and  SDRV  TIME. 

SETUP  TIME  - the  time  required  by  the  SETUP  preprocessor 

subroutine  for  CSKYDG2  to  read  the  coefficient  matrix  A from 
tape  and  pack  It  In  blocks  In  random  access  storage. 

CSKYDG2  TIME  - the  time  required  by  the  CSKYDG2  subroutine  to 
factor  the  coefficient  matrix  A using  the  Cholesky  decompo- 
sition and  obtain  the  solution  of 

T 

LL  X = B 

by  forward  and  backward  substitution. 

TOTAL  TIME  - the  sum  of  SETUP  TIME  and  CSKYDG2  TIME. 
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PROGRAM  LISTINGS 


A listing  of  the  Munksgaard  program  suitably  modified  for  the  GDC 

3 

6000  series  of  computers  will  be  found  in  Gignac.  Listings  for  CSKYDG2 

4 7 

and  the  Yale  program  will  be  found  in  Gignac  and  Eisenstat  et  al., 
respectively.  The  reader  is  reminded  that  there  are  restrictions  on  the 
use  and  distribution  of  the  Yale  program. 

OBSERVATIONS  AND  CONCLUSIONS 

An  examination  of  the  times  in  the  table  indicates  that  the  Yale 
program  is  the  fastest  by  far  of  all  three  programs,  ranging  up  to  six 
times  faster  than  the  Munksgaard  program  (total  time) . The  Munksgaard 
program  itself  can  be  four  to  five  times  faster  than  CSKYDG2  (total  time) . 
To  some  extent  the  time  difference  between  the  first  two  programs  can  be 
ascribed  to  the  fact  that  the  Yale  program  resequences  the  system  prior 
to  solution,  while  the  Denmark  program  reorders  while  solving  the  system. 
In  any  event  the  first  two  programs,  since  they  do  everything  in  core, 
have  a tremendous  time  advantage  over  CSKYDG2  which  makes  considerable 
use  of  random  access  storage. 

However,  a field  length  of  SOOOOOg  CM  was  required  to  obtain  some 
of  the  Yale  and  Munksgaard  times  in  the  table,  even  though  the  simplest 
of  driving  programs  was  used.  Clearly  the  use  of  some  device  such  as 
"overlay"  would  be  required  to  use  either  the  Yale  or  the  Munksgaard 
program  in  practical  work.  These  two  programs  are  really  suited  for 
computers,  such  as  the  Texas  Instruments  Advanced  Scientific  Computer, 
which  have  abundant  core  storage,  although  the  programs  would  have  to  be 
rewritten  for  the  most  part  to  obtain  the  full  benefit  of  the  computer's 
optimizing  capability.  For  further  testing  on  the  CDC  6000  series  these 
two  programs  should  definitely  be  modified  by  the  introduction  of 
Integer  packing  and  unpacking  subroutines  to  save  core  storage  by  cutting 
down  the  size  of  the  integer  arrays. 

3 

Since  the  CSKYDG2  program  is  profile  oriented,  its  performance  is 

9 10 

greatly  Improved  when  preceded  by  a preprocessor  such  as  BANDIT  ’ 
which  resequences  the  system  to  minimize  profile.  For  example  consider 
the  following  data: 


N.M 

SETUP 

CSKYDG2 

TOTAL 

(from  table) 

209,185 

4.789 

23.140 

27.929 

(after  BANDIT 

resequencing) 

209,185 

4.679 

4.224 

8.903 

(fiom  table) 

221,188 

6.413 

20.138 

26.551 

(after  BANDIT 

resequencing) 

221,188 

5.242 

1.993 

7.235 

Since  the  Yale  and  Munksgaard  programs  are  not  profile  oriented,  their 
performance  is  not  significantly  affected  by  the  BANDIT  resequencing. 

All  three  programs  provided  the  same  degree  of  accuracy  for  the 
computed  solutions — some  thirteen  or  so  decimal  places. 
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